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We derive the maximum bias functions of the MM-estimates and 
the constrained M-estimates or CM-estimates of regression and com- 
pare them to the maximum bias functions of the /S-estimates and the 
r-estimates of regression. In these comparisons, the CM-estimates 
tend to exhibit the most favorable bias-robustness properties. Also, 
under the Gaussian model, it is shown how one can construct a 
CM-estimate which has a smaller maximum bias function than a 
given S-estimate, that is, the resulting CM-estimate dominates the 
S-estimate in terms of maxbias and, at the same time, is considerably 
more efficient. 

1. Introduction. An important consideration for any estimate is an un- 
derstanding of its robustness properties. Different measures exist wliich try 
to reflect the general concept known as robustness. One such measure is the 
maximum bias function, which measures the maximum possible bias of an 
estimate under e-contamination. In this paper, we study the maximum bias 
functions for the AfM-estimates and the constrained M-estimates or CM- 
estimates of regression and compare them to the maximum bias functions 
for the ^-estimates and the r-estimates of regression. 

The maximum bias functions for Rousseeuw and Yohai's [10] S'-estimates 
of regression were originally derived by Martin, Yohai and Zamar [7] under 
the assumption that the independent variables follow an elliptical distri- 
bution and that the intercept term is known. More recently, Berrendero 
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and Zamar [1] derived the maximum bias functions for the S'-estimates of 
regression under much broader conditions. Further general results on the 
maximum bias functions can be found in [4]. The method used in [1] applies 
to a wide class of regression estimates. For example, it allows one to obtain 
the maximum bias functions of Yohai and Zamar's [12] r-estimates of re- 
gression. Unfortunately, it does not apply to Yohai's [11] MM-estimates of 
regression, arguably the most popular high breakdown point estimates of re- 
gression. The MM-estimates, for example, are the default robust regression 
estimates in S-PLUS. 

The original motivation for the current paper was thus to derive the 
maximum bias functions of the MM-estimates of regression and compare 
them to the maximum bias functions of the S'-estimates and r-estimates 
of regression. A lesser known high breakdown point estimate of regression, 
namely Mendes and Tyler's [8] constrained M-estimates of regression (or 
CM-estimates for short), has also been included in the study since the as- 
sociated maximum bias functions can be readily obtained by applying the 
general method given in [1]. Expressions for the maximum bias functions of 
the MM-estimates and the CM-estimates are derived in Sections 3 and 4. 
Comparisons between the S-, r-, MM- and CM-estimates based on biweight 
score functions are given in Section 5. It turns out that in these comparisons, 
the CM-estimates tend to exhibit the most favorable robustness properties. 

Consequently, a more detailed theoretical comparison between the maxi- 
mum bias functions of the S-estimates and the CM-estimates of regression, 
which helps explain the computational comparisons made in Section 5, is 
given in Section 6. In particular, under the Gaussian model, it is shown 
how one can construct a CM-estimate of regression so that its maximum 
bias function dominates that of a given S-estimate of regression. That is, 
the maximum bias function of the CM-estimate is smaller for some level of 
contamination e and is never larger for any value of e. The S'-estimate is 
thus said to be bias-inadmissible at the Gaussian model. 

Section 2 reviews the notion of the maximum bias function in the re- 
gression setting, as well as the definitions of the S-estimates, the MM- 
estimates and the CM-estimates for regression. Technical proofs are given 
in the Appendix. 

2. The regression model and the concept of maximum bias. We follow 
the general setup given in [7]. Specifically, we consider the linear regression 
model 



where y G M represents the response, x = {xi,X2, • • • , Xp)' £ MP represents a 
random vector of explanatory variables, Oq £M and Oq G W are the true in- 
tercept and slope parameters, respectively, and the random error term n G M 




y = ao + x'Oo + u, 
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is assumed to be independent of x. Let Fq and Go represent the distribution 
functions of u and x, respectively, and let Ho represent the correspond- 
ing joint distribution function of (y,x). The following assumptions on the 
distribution Ho are assumed throughout the paper: 

Al. Fo is absolutely continuous with density fo which is symmetric, contin- 
uous and strictly decreasing on M"*"; 
A2. PgA^'^ = c) < 1 for any eW , 6 ^ 0, c eR. 

As in [7] and [1], we focus on the estimation of the slope parameters 6o- 
One reason for doing so is that once given a good estimate of the slope 
parameters, the problem of estimating the intercept term and the resid- 
ual scale reduces to the well-studied univariate location and scale problem. 
Let T represent some M^-valued functional defined on "H, a space of distri- 
bution functions on M^^^ which includes some weak neighborhood of Ho, 
such that T{Ho) = Oo- For sufficiently large n, 7i almost surely contains 
the empirical distribution function Hn corresponding to a random sample 
{(j/i,xi), . . . , (y„,x„)} from Ho- Furthermore, we assume that T is weakly 
continuous at Ho, and so the statistic T„ = T(i/„) is a consistent estimate 
of Oo. 

All functionals T considered in this paper are regression equivariant, as 
defined, for example, in [7]. For such functionals, a natural invariant measure 
of the "asymptotic" bias of T at is given by 



(2.2) 6s„(T,if) 



{{T:{H)-eo)'T.oiT{H)-eo)f\ Hen, 
oo, H^n. 



Here, So = S(Go) is taken to be an affine equivariant scatter matrix for the 
regressors x under Go- We can thus presume, without loss of generality, that 
{oo, Oo) = and So = I. Hence, the asymptotic bias of T at becomes the 
Euclidean norm of T, 



(2.3) biT,H) 



'\\T{H)\\, H€n, 

oo, H^n, 



where 7i is the class of distributions such that ||T(ii")|| < oo. The maximum 
asymptotic bias of T over e-contaminated neighborhoods of Ho, that 
is, Ve = {H\H = {l- e)Ho + eH*,H* E H*}, where H* is the set of all 
distribution functions on R^''"^, is defined to be 

(2.4) Brrie) = sup{6(T, H)\H€ 

and the asymptotic breakdown point is subsequently defined to be 

(2.5) e* = inf{e|BT(e)=oo}. 
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From an applied perspective, regardless of So, it may be of interest to 
derive upper bounds for the Euclidean distance between T{H) and Oq, that 
is, for ||T(iif ) — 0o|| • This measure is referred to as a bias bound by Berrendero 
and Zamar in [1], wherein they use it for adjusting confidence intervals for 
9 to include the possibility of bias introduced by a contaminated model. 
Note that the bias bound is regression and scale equivariant, but not affine 
equivariant, and hence is not directly related to the maximum bias (2.4). In 
[1] some results are given for computing bias bounds, taking the maximum 
bias function as a starting point. 



2.1. M-estimates with general scale. The S-, MM- and CM-estimates of 
regression all lie within the class of M-estimates with general scale consid- 
ered in [7]. An M-estimate, or, more appropriately, an M-functional, with 
general scale for the regression parameters Oq and 6o, say t{H) and T{H), 
respectively, can be defined as the solution which minimizes 



(2.6) Eh 



a 



over all a € M and 6 , where p is some nonnegative symmetric func- 
tion and (t{H) is some scale functional. The scale functional (t{H) may be 
determined simultaneously or independently of {t(i?), T(iJ)}. We assume 
throughout the paper that (t{H) is regression invariant and residual scale 
equivariant, again as defined, for example, in [7]. Throughout, it is assumed 
that the function p satisfies the following conditions: 

A3, (i) p is symmetric and nondecreasing on [0, oo) with p{Q) = 0; 

(ii) p is bounded with limu_^oo p{u) = 1; 

(iii) p has only a finite number of discontinuities. 

If the function p is also differentiable, then {t(ff), T(ff)} is a solution to 
the p+1 simultaneous M-estimating equations 

where ^^{u) oc p'{u). By Condition A3(i), il) is an odd function, nonnegative 
on [0, oo). Condition A3(ii) implies that these M-estimates are redescending, 
that is, il^{u) ^ as ti — > oo. A popular choice for M-estimates are Tukey's 
biweighted M-estimates, which correspond to choosing p{u) to be 



(2.8) pt{u) 



3u2-3n^ + n^ for |n| < 1, 
1, for \u\ > 1. 



Note that this gives rise to the biweight ^ function ipxiu) = — u'^)^}'^ . 
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The S'-estimates for the intercept, slopes and scale are collectively defined 
to be the solution {ts(iJ), Ts(ff), (-?/)} to the problem of minimizing a G 
subject to the constraint 



for some fixed value 6, < 6 < 1. The breakdown point of the S'-estimate 
of regression is e* = min{6, 1 — 6}. A drawback to the S'-estimates is that 
the tuning constant h not only determines the breakdown point, but it also 
determines the efficiency of the estimate. To obtain a reasonable efficiency 
under a normal error model, one must usually decrease the breakdown point 
substantially. 

This problem with tuning the S'-estimates of regression motivated Yohai 
[11] to introduce the MM-estimates of regression which can be tuned to have 
high efficiency under normal error while simultaneously maintaining a high 
breakdown point. Let pi and p2 be a pair of loss functions satisfying A3 and 
with pi> p2- Set b = Ep^pi(Y). MM-estimates are collectively defined to be 
the solution {tMM{H),TMMiH)} which minimizes 



where s{H) = as{H) is a preliminary S-functional of scale, as defined above, 
based on p = pi. The breakdown point of the MM-estimates depends only 
on pi and is given by e* = min{6, 1 — 6}. On the other hand, their asymptotic 
distribution is determined exclusively by p2- This allows the MM-estimates 
to be tuned so that they possess both high breakdown point and high effi- 
ciency. 

The CM-estimates are another class of regression estimates which can be 
tuned to have high efficiency at the normal model while maintaining a high 
breakdown point. The CM-estimates for the intercept, slopes and scale are 
collectively defined to be the solution {tcM{H),TcM{H),acM{H)} which 
minimizes 



subject to the constraint (2.9), where c > represents a tuning constant. As 
with the 5-estimates of regression, the asymptotic breakdown point of the 
CM-estimates of regression is e* = min{6, 1 — 6}. Unlike the S'-estimates of 
regression, though, the CM-estimates of regression can be tuned by means 
of the constant c in order to obtain a reasonably high efficiency without 
affecting the breakdown point. 

We again emphasize that our focus here is on the slope functionals T{H), 
rather than on the intercept functionals t{H) or the scale functionals cr{H). 



(2.9) 





(2.10) 




6 



J. R. BERRENDERO, B. V. M. MENDES AND D. TYLER 



Given a good slope functional, one may wish to consider the wider range of 
location and scale functionals based on the distribution of y — x.'T(H) such 
as its median and median absolute deviation, rather than those arising from 
an S-, MM- or CM-estimate of regression. 

3. Maximum bias functions. 

3.1. Maximum bias functions for MM-estimates. If Fh ^ q is the distri- 
bution function of the absolute residuals \y — a — x'0| , then Berrendero and 
Zamar [1] give an expression for the maximum bias function for any estimate 
whose definition can be expressed in the form 

(3.11) {t{H),T{H)} = sxguAuJ{FH,a,e), 

{a,0) 

where J{F) is a functional possessing certain monotonic properties. The S-, 
T- and CM-estimates are of this form. Applications of their general results 
to the S- and r-estimates are given in [1]. Application of these results to the 
CM-estimates is presented in Section 3.2. 

The MM-estimates, however, cannot be expressed in the form (3.11), so 
a different approach is needed in order to study their bias behavior. Let 
Bmm{£) be the maximum bias function of an MM-estimate of regression. 
In this subsection, lower and upper bounds for Bmm{£) are obtained under 
quite general conditions. In some important cases, these two bounds are 
often equal and thereby allow for the exact determination of the maximum 
bias function. 

Let s = inf/i-gvi ^ = ^^^H<^Ve ^i^) 

m(t,s)= inf inf £'^,^2^^ — - — ^) - £^HoP2 f- Y 

\\0\\=ta&L \ S J \S J 

The following two functions play a key role in the developments below: 

(3.12) hi{t) = m{t,s) and h2{t) = inf_m(t,s). 

s<s<s 

Theorem 3.1. Let Tmm be an MM-estimate of the regression slopes 
with loss functions pi, i = 1,2, satisfying A3. Assume that the maximum bias 
function of the S-estimate with score function pi, Bs{e), satisfies Bs{£) < 
h^'^[e/{l - e)]. Under Al and A2, the maximum bias function of Tmm, 
Bmm{£), satisfies 

(3.13) h^^ {t^) - ^"^"^^^^ - (r^) ■ 
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Note that the condition Bs{£) < [e/(l — e)] of the above theorem, 
together with (3.13), imphes that Bs{e) < Bmm^^)- This condition usually 
holds for an appropriately chosen pi function. Thus, an MM-estimate does 
not improve upon the maximum bias of the initial ^-estimate. The trade-off, 
though, is that with an appropriately chosen p2 function, the MM-estimate 
can greatly improve upon the efficiency of the initial S'-estimate. 

Upper and lower bounds for the maximum bias of MM-estimates have also 
been obtained, respectively, by Hennig [5], Theorem 3.1, and Martin, Yohai 
and Zamar [7], Lemma 4.1, under the assumption of unimodal elliptically 
distributed regressors. For this special case, the upper bounds given in (3.13) 
and in [5] agree. On the other hand, the lower bound given in [7], namely 
Bmm{£) > /io'Me/(l -^)], where ho{t) — supg^g^gTn^t, s) , is not as tight as 
that given in (3.13). 

In our setup, the assumption of unimodal elliptical regressors is equivalent 
to the following: 

A2*. Under Go, the distribution of x'6 is absolutely continuous, with a 
symmetric, unimodal density, and depends on 9 only through \\6\\ for 
ah 6> / 0. 

Under this condition, we can define 



where 6 is any vector such that \\9\\ = t. Under conditions Al, A2* and 
A3, it is shown in Lemma 3.1 of Martin, Yohai and Zamar [7] that g is 
continuous, strictly increasing with respect to ||0|| and strictly decreasing in 
s for s > 0. 

If A2* holds, then s and s are defined so that gi{s,0) = b/{l — e) and 
gi{s, 0) = {b — e)/{l — e), respectively, and m{t, s) = g2{s, t) — g2{s, 0), where 
gi{s,t) is defined as in (3.14) after replacing p with pi. 

3.2. Maximum bias curves for CM-estimates. A CM-estimate of regres- 
sion {tcM{H),'T cm{H)} can be expressed in the form (3.11) with J taken 
to be 



and where (j{F) is the M-scale defined as the solution to the equation 



(3.14) 




(3.15) 



Jcm{F) 



s>a(F) 



inf cEF[p(y/s)] +logs 



(3.16) 



^F[p{y/a{F))]=b. 



Consequently, application of the general method in [1] for computing maxi- 
mum bias functions leads to the following result: 
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Theorem 3.2. Let Tcm be a CM-estimate of the regression slopes based 
on a function p satisfying A3, and suppose Hq satisfies Al and A2. Define 

rcM{e) = Jcm[(1 - e)FHo,o,o + edoo] 

and let 

(3.17) mcM{t)= inf ini JcmH^ - £)FHa,a,9 + ^Sq]. 

||6»||=ta6l8 

Then the maximum bias function of Tcm, denoted by Bcui^), is given by 

(3.18) BcM{e) = "icM['^CM(e)]- 

This general result can be given a simpler representation when condition 
A2* also holds. In particular, in the definition of mcMit), the infimum is 
obtained when a = and 9 is any vector such that ||0|| =t. This gives 

mcMit) = inf{^c,e(s,t) I s > ms{t)}, 
where Ac^^{s,t) = c(l — e)g{s,t) +logs and ms{t) =5^^(6/(1 — e),t), with 
g{s,t) being defined as in (3.14) and g^^{-,t) being the inverse of g with 
respect to s. Also, it is easy to verify that 

rcMie) = inf{Ac,e(s,0) | s > rs{e)} + ce, 
where r,{e) = g^^]iib - e)/(l - e), 0). 

4. Maximum bias functions for two special cases. Maximum bias func- 
tions generally tend to have rather complicated expressions. Under some 
model distributions, though, these expressions can be substantially sim- 
plified. This is possible for two special cases considered here, namely the 
Gaussian and Cauchy models. These simplified expressions are useful for 
computing and comparing the maximum bias curves of various estimates 
for these models, which is done in Section 5. 

4.1. Maximum bias functions under the Gaussian model. We assume 
throughout this section that not only the error term, but also that the 
regressor variables arise from a multivariate normal distribution. That is, 
we assume Hq has a joint N(0,lp+i) distribution and refer to this as the 
Gaussian model. Let g{s) =J^^p{Z/s), where Z is a standard normal ran- 
dom variable, and define ab,£ = g~^[{b — e)/(l — s)] and 7b_£ = g~^ [b/ (1 — e)]. 
Martin, Yohai and Zamar [7] show that the maximum bias function for an 
5-estimate of the regression slope under the Gaussian model and based on 
a function p satisfying A3 is given by 



(4.19) Bs{e) 



0'b,e 
lb,e 



1 



1/2 



MAXBIAS FOR MM- AND CM-ESTIMATES 



9 



To obtain an expression for the maximum bias function of a CM-estimate 
of regression under the Gaussian model, let 



(4.20) 



-4c,e(s) = c(l - £)g{s) + log s. 



Also, define Dc{e) = mfs>f^^ ^ Ac^s{s) — mis>-y^ ^ Ac^^{s). We then have the 
following result: 

Theorem 4.1. Let Tqm be a CM-estiniate of the regression slopes based 
on a function p satisfying A3 and assume Hq is multivariate normal. We 
then have 



(4.21) 



BcM{e) = {exp[2ce + 2Dc{e)] - 1} 



1/2 



Turning now to the MM-estimates, let gi{s) = Fj^pi{Z/s) for i = 1,2, 
where Z is a standard normal random variable. Under the Gaussian model. 



m{t, s) = §2 



92{s). 



,(l + t2)l/2^ 

Moreover, s = g'^^[[h — e) / (1 — e)] and s = ^ [h/ (1 — e)]. Since pi is the same 
p-function used in defining the preliminary S'-estimate, we have s = CTh^e and 
s = ^h,e- Hence, Bmm{e:) where 



(4.22) 



92 [92{(yh,e) + e/{l- e)] 



nl/2 



1 



A simpler form for the upper bound, which can be used for computational 
purposes, can be obtained under some additional regularity conditions on 
g2{t)- These conditions hold in most cases of interest. 

A4. (i) g{s) is continuously differentiable; 

(ii) (j){s) = —sg'{s) is unimodal, with its maximum being obtained at 
C7M- Set K = 4){aM)- 

Theorem 4.2. In addition to the assumptions of Theorem 3.1, suppose 
that 52(5) satisfies A4. Then when Ho is multivariate normal, 

^(e) < BMuie) < max{^(e),n(e)}, 

where £(e) is given in (4.22) and 



u{e) 



lb,e 



g2'[92{lb,e) + e/{l-e)] 



nl/2 



1 
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The upper bound in Theorem 4.2 coincides with that obtained by Hennig 
[5]. However, the tighter lower bound gives us further insight into the max- 
imum bias and enables us to determine when the bounds are actually an 
equality. Obviously, if e is such that u{e) <i{e), then Bmm{£) =^(e)- This 
occurs in many important cases for a wide range of e values. 

As an example, consider the biweight loss function pT defined by (2.8). 
If we choose pi{u) = pxiu/ki) and P2iu) = PT{u/k2) with tuning constants 
ki = 1.56 and ^2 = 4.68, respectively, and choose h = 0.5, then the resulting 
MM-estimate has a 50% breakdown point and is asymptotically 95% efficient 
under the Gaussian model. For this case, it can be verified that the condition 
Bs{e) < /ij"^[e/(l -e)] in Theorem 3.13 holds. From (4.22), it can be noted 
that this condition is equivalent to 52(7b,e) — 52(c"b,e) < e/(l — e). It can also 
be verified that the corresponding 02 function is unimodal. A plot of (^2 is 
displayed in the upper panel of Figure 1. The bounds given in Theorem 4.2 
for this MM-estimate are displayed in the lower panel of Figure 1. Both 
bounds coincide for values of e up to approximately 0.33. Hence, for such e, 
the exact value at the maximum bias function is known. 



4.2. Maximum bias functions under the Cauchy model. We now assume 
that the error term and the regressors follow independent Cauchy distribu- 
tions rather than normal distributions. That is, we assume xi,...,x„ and 
y have independent standard Cauchy distributions. For brevity, we refer to 
this distributional model as the Cauchy model. Note that in this case, the 
distribution of the regressors is not elliptically symmetric. The derivations 
for the Cauchy model closely follow those given for the Gaussian model. 

Let g{s) = E,^p{Z/s), where Z is now a standard Cauchy random vari- 
able, and, again, let ab,£ = g^^ [{b — e)/{l — e)] and 'yb,e = ~ ^)] ■ 
the Appendix, we show the maximum bias function for an S'-estimate of 
regression to be 

(4.23) Bsie) = ^-l 

and for a CM-estimate of regression to be 

(4.24) BcMie) = exp{Z),(e) + ce} - 1, 

with Dcie) being analogous to its definition given after equation (4.20). Up- 
per and lower bounds for the maximum bias function for the MM-estimates 
of regression are shown in the Appendix to be 

(4.25) i{e) < BMM{e) < max{£(e),ii(e)} where 



92 n5'2(cTfe,£) + e/(l -e)] 



1 and 
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""^"^ 52-'b2(7^.)+e/(l-^)] ■ 

The conditions given in (4.19), Theorem 4.1 and Theorem 4.2 for the Gaus- 
sian model are also being assumed here for (4.23), (4.24) and (4.25), respec- 
tively, for the Cauchy model. For an MM-estimate of regression, condition 
A4 can again be shown to hold when using a biweight loss function. 

It is somewhat surprising that the expressions for Bs{£), Bcm{£) and 
Bmm{£) are of order o(e) as e — > under the Cauchy model, in contrast 
with the usual ^/£ order. This is not, however, a contradiction of known 
results which establish general ^/e order for the maximum bias functions 
of regression estimates based on residuals since such results require either 
elliptical regressors, as in Yohai and Zamar [13], or the existence of second 
moments for the regressors, as in He [3] or Yohai and Zamar [14]. 

5. Maximum bias curve comparisons. 

5.1. The Gaussian model. Most estimators need to be tuned so that they 
perform reasonably well under some important model, as well as being ro- 
bust to deviations from the model. In practice, one often tunes an estimate 
so that it has good efficiency under the Gaussian model as well as a high 
breakdown point. For smooth p-functions, both the MM- and CM-estimates 
of regression can be tuned to have a 50% breakdown point and 95% asymp- 
totic relative efficiency under the Gaussian model. This is also true for the 
class of r-estimates; see Yohai and Zamar [12] for details. Thus, these esti- 
mates cannot be ranked on the basis of their efficiency and breakdown point 
alone. Comparing their maximum bias behavior under the Gaussian model 
gives further insight into how these estimates are affected by deviations from 
the model. 

Here, we again consider the estimates associated with the family of Tukey's 
biweight loss function (2.8). The 95% efficient biweight MM-estimate with 
a 50% breakdown point was discussed in the previous subsection. A 95% 
efficient biweight CM-estimate with a 50% breakdown point is obtained by 
choosing p{u) = Pt{u), b = 0.5 and the tuning constant c = 4.835; see [8] for 
details. In contrast, a 95% efficient biweight 5-estimate of regression has a 
12% breakdown point, whereas a biweight S'-estimate with a 50% breakdown 
point is only 28.7% efficient under the Gaussian model. 

Figure 2 represents the maximum bias functions under the Gaussian 
model of the MM-, CM- and r-estimates based on biweight functions and 
tuned so that they have 95% (asymptotic) efficiency under the Gaussian 
model and a 50% breakdown point, as well as the 95% efficient biweight 
5-estimate. We observe that up to e ~ 0.28, the r-estimate has a larger bias 
than the MM-estimate and then a smaller bias afterward. The r-estimate. 
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Fig. 2. Maximum bias functions for a biweight S-estimate (dashed line), MM-estimate 
(dotted line, lower bound), r-estimate (solid line) and CM-estimate (dashed- dotted line). 
All of the estimates have 95% efficiency under the Gaussian model. The S-estimate has a 
breakdown point of 12%, whereas the others have a 50% breakdown point. 

though, has a larger bias than the CM-estimate over essentially the entire 
range of e. Up to e ~ 0.20, MM- and CM-estimates are roughly equivalent, 
although for larger fractions of contamination, the CM-estimate is clearly 
better. 

As a further comparison, Figure 3 again shows the maximum bias function 
under the Gaussian model of the above 95% efficient biweight MM- and 
CM-estimates, as well as of the less efficient 50% breakdown point biweight 
^-estimate. Also included in Figure 3 is the biweight CM-estimate having a 
50% breakdown point and an asymptotic relative efficiency of 61.1% under 
the Gaussian model, which corresponds to choosing the tuning constant 
c = 2.568. (The efficiency of the CM-estimate based on a biweight function 
with 6 = 1/2 and c = 2.568 under the Gaussian model is incorrectly reported 
as 28.7% rather than 61.1% in Table 1 of Mendes and Tyler [8]. The rest of 
Table 1 of [8] is correct.) 

The maximum bias of the 95% efficient MM-estimate is uniformly larger 
than that of the corresponding 5'-estimate. This is consistent with the gen- 
eral result given in Theorem 3.1. The increase in bias for the MM-estimate 
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Fig. 3. Maximum bias functions for a biweight S-estimate (solid line), MM-estimate 
(dotted line, lower bound), and two CM-estimates (dotted- dashed line and solid line). The 
plot for the S-estimate and the second CM-estimate are almost identical. All estimates 
have a 50% breakdown point. The MM-estimate and the first CM-estimates (dotted- dashed 
line) have 95% efficiency under the Gaussian model. The second CM-estimate (solid line) 
has an efficiency o/61.1%, whereas the efficiency of the S-estimate is 28.7%. 

is compensated by its increase in efficiency. A curious observation, tliougli, is 
that for large fractions of contamination, the maximum bias of the 95% effi- 
cient CM-estimate is lower than that of the 28.7% efficient S'-estimate. Fur- 
thermore, the maximum bias of the 61.1% efficient CM-estimate is almost 
identical to (and as shown theoretically in the next section, is never larger 
than) that of the 28.7% efficient S'-estimate. That is, there is no trade-off 
between increased efficiency and maximum bias for this CM-estimate rela- 
tive to the S'-estimate. In practice, given that the maximum bias function 
of the 95% efficient CM-estimate does not greatly differ from that of the 
61.1% estimate, the 95% efficient estimate would be preferable. 

5.2. The Cauchy model. We now consider the maximum bias behavior 
of S-, MM- and CM-estimates under the Cauchy model. Figure 4 shows 
the maximum bias functions under the Cauchy model for the MM- and 
CM-estimates which are 95% efficient under the Gaussian model, as well 
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as for the 28.7% efficient biweight S'-estimate and the 61.1% efficient CM- 
estimate discussed in Section 5.1. The breakdown point of each of these 
estimates remains 50% under the Cauchy model. The estimates, though, 
are not retuned here for the Cauchy model. Rather, our goal is to make 
further comparisons between the same estimates. In practice, given a specific 
estimate, one would wish to evaluate its robustness properties under various 
scenarios. From Figure 4, it can be noted that the 95% efficient CM-estimate 
tends to have the better maximum bias behavior under the Cauchy model, 
even better than that of the 61.1% efficient CM-estimate. 



5.3. Other considerations. Aside from maximum bias functions, a clas- 
sical way of evaluating the robustness of an estimate as it deviates from 
normality is to consider its efficiency under other distributions. The asymp- 
totic efficiencies under the Gaussian model discussed in Section 5.1 depend 
on the distribution of the error term being normal. They do not however 
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Fig. 4. Maximum bias functions for a biweight S-estimate (solid line), an MM-estimate 
(dotted line, lower bound) and two CM-estimates (dotted- dashed line and solid line). The 
plot for the S-estimate and the second CM-estimate are almost identical. All estimates 
have a 50% breakdown point under the Cauchy model. The MM-estimate and the first 
CM-estimate have 95% efficiency under the Gaussian model, whereas the second CM-es- 
timate and the S-estimate have efficiencies of 61.1% and 28.7%, respectively, under the 
Caussian model. 
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Table 1 

Asymptotic variances and residual gross error sensitivities of some S-, MM- and 
CM-estimates of regression under symmetric error distributions 







NORM 


SL 


CAU 


T3 


DE 


CN 


UNIF 


S95 


AVAR 


1.053 


1.798 


2.209 


1.257 


1.429 


1.091 


0.771 




RGES 


1.770 


3.277 


3.716 


2.146 


2.258 


1.942 


1.415 


MM95 


AVAR 


1.053 


1.230 


1.312 


1.221 


1.368 


1.087 


0.713 




RGES 


1.770 


2.146 


2.243 


1.953 


2.038 


1.844 


1.548 


CM95 


AVAR 


1.053 


1.159 


1.202 


1.227 


1.396 


1.088 


0.755 




RGES 


1.770 


1.995 


2.061 


1.988 


2.138 


1.835 


1.439 


CM61 


AVAR 


1.637 


1.330 


1.059 


2.091 


1.528 


2.891 


1.128 




RGES 


1.838 


1.900 


1.765 


2.285 


2.045 


2.619 


1.405 


S28 


AVAR 


3.484 


1.330 


1.059 


2.091 


1.528 


2.891 


120.336 




RGES 


2.850 


1.900 


1.765 


2.285 


2.045 


2.619 


15.621 



depend on the distribution of the carriers being normal. Rather, the only as- 
sumption needed for the carriers is that they possess second moments. This 
is also true for the asymptotic efficiencies at other symmetric error distri- 
butions; see, for example, Maronna, Bustos and Yohai [6]. In particular, the 
authors note that the asymptotic variance-covariance matrix of = T„ has 
the form cr^5]x, where is the variance-covariance matrix of the carriers 
X and au depends only on the distribution of the error term u. 

In Table 1, we again consider the 95% efficient biweight S-, MM- and 
CM-estimates, the 28.7% efficient biweight 5-estimate and the 61.1% ef- 
ficient CM-estimate discussed in Section 5.1, where the efficiency is taken 
under a normal error model. These estimates are labeled S95, MM95, CM95, 
S28 and CM61, respectively. For these estimates, we compute their asymp- 
totic variances cr^ (AVAR) under a variety of symmetric error models. Be- 
sides the standard normal (NORM), these models include the slash (SL), 
the Cauchy (CAU), the is-distribution (T3), the double-exponential (DE), 
a 90-10% mixture of a standard normal and a normal with mean zero and 
variance 9 (CN) and the uniform distribution on (—1,1) (UNIF). Each of 
these distributions is normalized so that its interquartile range is equal to 
that of the standard normal, namely 1.3490. This corresponds to multiply- 
ing the SL, CAU, T3, DE, CN or UNIF random variables by 0.4587, 0.6745, 
0.8818, 0.9731, 0.9248 and 1.3490, respectively. Also included in Table 1 are 
the residual gross error sensitivities (RGES); see Hampel et al. [2]. Formulas 
for AVAR and RGES can be found in [8]. 

From Table 1, it can be noted that the estimates MM95 and CM95 be- 
have similarly with respect to asymptotic variance and residual gross er- 
ror sensitivity, with CM95 being slightly better at the longer-tailed slash 
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and Cauchy distributions and the MM95 being slightly better at the more 
moderate and double-exponential distributions. Both MM95 and CM95 
perform better than S95 at longer-tailed distributions. The behavior of S28 
and CM61 is the same, except at the normal and uniform distributions. At 
longer-tailed distributions, equality tends to hold for the constraint (2.9) on 
CM61 and so as an estimate, it is asymptotically equivalent to S28 under 
these distributions. Under the normal and the uniform distributions, there 
is a considerable difference in favor of CM61. Curiously, the behavior of S28 
and CM61 under the Cauchy distribution is better than that of MM95 and 
CM95. However, based on the overall behavior of the asymptotic variances 
and residual gross error sensitivities alone, either MM95 and CM95 would 
be preferable in practice. 

6. Bias-inadmissibility of 5-estimates under the Gaussian model. Through- 
out this section, the Gaussian model is presumed to hold, that is, it is pre- 
sumed that both the response and the carriers are normal. In Section 5.1, 
it was noted that under the Gaussian model, the maximum bias function of 
the 61.1% efficient biweight CM-estimate is never smaller than that of the 
27.78% efficient biweight S'-estimate. In this section, we verify this result 
theoretically, rather than computationally. Moreover, we note that this re- 
sult is not specific to the use of the biweight estimates. In general, we show 
that for a given ^-estimate, it is usually possible to tune the correspond- 
ing CM-estimates (through the value of c) so that Bcm{£) < Bs{s) for all 
e and with strict inequality for at least one value of e. In such a case, we 
will say that with respect to the maximum bias criterion, the estimate Ts is 
inadmissible under the Gaussian model since it can be dominated by Tqm- 

To show this, we need to carefully compare the maximum bias functions 
of the CM-estimates and the .S-estimates. An alternative representation for 
Bcm{£) in terms of Bs{s) under the normal model [see equations (4.21) and 
(4.19)] is given by 

(6.26) log[l + Bl^ie)] = log[l + S|(e)] + 2d,{e), 
where dc{e) = /ic(e,7b,£) - hc{e,ab,e) and 

(6.27) hc{e,a)=Ac,s{(T) - inf A,^,{s). 

S>(7 

The functionals Tcm and Ts in (6.26) are understood to be defined by using 
the same p and the same value of b. From representation (6.26), it is apparent 
that what we need to consider is the sign of dde) in terms of c and e. The 
following result represents a first step in determining appropriate values of 
the tuning constant c necessary for showing the bias inadmissibility of an 
5-estimate. The value of K below is defined within condition A4. 
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Theorem 6.1. Suppose that p is such that conditions A3 and A4 hold. 
Then: 

(i) if c<\/K, then Bcm{£) = Bs{e) for all e; 

ill) for any e such that c> c(e) = s~^log{abi;/jhe)> have Bcm{£) > 
Bsis). 

As a consequence, for the CM-estimate to improve upon the maximum 
bias function of the ^-estimate, one needs to choose c > 1/K. On the other 
hand, if Cq = inf{c(e) -.0 < e <b}, then we also need to choose c < Cq. This 
range is not empty since, as shown in the Appendix, 

(6.28) 1/K <il-b)/K + b/^{jb,o)<Co, 

where 0(s) is defined within condition A4. 

For c< 1/K, the CM-functional is the same as the S'-functional at Hq, 
as well as at any H in an e-contaminated neighborhood of Hq. This is 
because equality is obtained in the constraint (2.9) for the CM-estimate 
and when equahty is obtained, the CM-estimate gives the same solution as 
the corresponding S'-estimate. Thus, for c < 1 /K, the CM-estimate has the 
same maximum bias function as the corresponding S'-estimate. On the other 
hand, for large values of c, the CM-estimate tends to give a solution similar 
to the least squares solution, so one expects the maximum bias function 
to be unacceptably large, even though the breakdown point may be close 
to 1/2. In fact, one can note from (4.21) that for any e, Bcm{^) — > oo as 
c — > oo. 

Varying the tuning constant c may decrease the maximum bias for some 
values of e, while increasing the maximum bias for other values of e. The 
question we address now is whether it is possible to find a moderate value 
of c (necessarily between 1/K and Cq) such that the maximum bias function 
of the CM-estimate improves upon the maximum bias function of the S'- 
estimate. 

The following result shows that in most cases of interest, the condition 
c < Co is not only necessary, but also sufficient, to obtain Bcm{£) < Bs{£) 
for all £. The value of (jm below is also defined within condition A4. 

Theorem 6.2. Suppose that the assumptions of Theorem 6.1 hold. If 
c< Co and g{aM) < then Bcm{£) < Bs^e) for all e > 0. 

Remark 6.1. This result cannot be improved upon. That is, if c > c(e), 
then Bs{e) < Bcm{£) by Theorem 6.1. Also, if c < Cq and g{aM) > then 
either Bs{£) < Bcm{£) for some e or Bs{s) = Bcmi^) for all e. This remark 
is verified in the Appendix. 
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In order to show that an S'-estimate can be dominated by a CM-estimate 
with c chosen so that 1/ K < c< Cq-, it remains to be shown that for some 
< e < 6, Bcm{£) < Bs{e). For specific examples, this can be checked nu- 
merically; under additional assumptions, though, it can be shown analyti- 
cally. 

Theorem 6.3. Suppose that the assumptions of Theorem 6.2 hold. Fur- 
thermore, suppose that g{s) is convex and 

(6-29) (pWbfi) > — FT— -7 

2-[b + g{aM)\ 

Then for any value c such that 

. log(crMM.o) ^ , 1 
Cl = — ; ^ < C < 



the CM- estimate of regression dominates the S- estimate of regression with 
respect to the maximum bias function. Furthermore, this range of values for 
c is not empty. 

Remark 6.2. From the proof of Theorem 6.3, it follows that a con- 
dition more general than (6.29) under which the conclusions also hold is 
Co = lim£^o+ However, (6.29) is easier to check and holds in most cases 
of interest. 

Consider the biweight ^-estimate with breakdown point b < 1/2. It can 
be verified that the conditions of Theorem 6.3 hold whenever b > 0.410, so 
any such biweight ^-estimate is inadmissible with respect to maximum bias 
under the Gaussian model. For b= 1/2, that is, for the 27.78% efficient bi- 
weight ^-estimate, the value of c = 2.568 falls within the interval given in 
Theorem 6.3. Hence, the 61.1% efficient biweight CM-estimate dominates 
the 27.78% efficient biweight 5-estimate with respect to maximum bias un- 
der the Gaussian model. As noted in Section 5.1, although the decrease in 
maximum bias is negligible, the increase in efficiency is not. 

As another example, consider the a-quantile regression estimates. These 
correspond to S'-estimates with p{u) = I{\u\ > 1} and b= 1 — q. It is straight- 
forward to verify that the conditions of Theorem 6.3 hold in this case when- 
ever 6 > 0.3173, so the a-quantile regression estimates with a < 0.6837 are 
inadmissible under the Gaussian model with respect to maximum bias. 
Again, the decrease in maxbias is not large. For example, in the special 
case a = b = 0.5, for which the resulting a-quantile estimate corresponds to 
Rousseeuw's [9] least median of squares estimate (LMS), the best improve- 
ment is only 95.7% of the LMS bias. 
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The a-quantile estimates are often referred to as minimax bias regression 
estimates. Martin, Yohai and Zamar [7] show that within the class of M- 
estimates of regression with general scale, an a-quantile estimate minimizes 
the maximum bias at e, with the value of a depending on e. Yohai and Zamar 
[13] generalize this minimax result to the class of all residual admissible 
estimates of regression. Under the Gaussian model, an a-quantile estimate 
can be shown to have minimax bias for some e whenever 0.500 < a < 0.6837 
or, equivalently, whenever 0.3173 <b < 0.500. Despite having minimax bias 
under the Gaussian model for a given e, these a-quantile regression estimates 
are still inadmissible under the Gaussian model with respect to maximum 
bias. In particular, as shown in Theorem 6.1, for a given a-quantile estimate 
having minimax bias at e = e^, it is possible to construct a CM-estimate 
which also has the same maximum bias at e = e^. Moreover, the maximum 
bias of this CM-estimate is never larger than the maximum bias of the 
given a-quantile estimate at any other value of e Sa, and furthermore is 
smaller than the maximum bias of the given a-quantile estimate at some 
values of e ^ e^. Although the decrease in the maximum bias may not be 
of practical importance, these observations expose some limitations of the 
notion of minimax bias. 

The minimax bias results given in [13] for the a-quantile regression es- 
timates apply more generally than to just the Gaussian model. They also 
apply to models having a symmetric unimodal error term along with ellip- 
tically distributed carriers. Under such models, though, we conjecture that 
the a-quantile regression estimates may again be inadmissible with respect 
to maximum bias, but we do not pursue this topic further here. The value of 
a which attains the minimum maxbias at a particular e is not only depen- 
dent on the value of e, but also dependent on the particular model. That is, 
a particular a-quantile estimate is not necessarily minimax at e over a range 
of models, but is only known to be minimax at e under a specific model. 
Any estimate which can be shown to dominate an a-quantile estimate would 
most likely need to be model-specific. 



APPENDIX 

In this section, we include proofs of the results and other technical ques- 
tions. 



Proof of Theorem 3.1. It can be shown, following the proofs of Lem- 
mas 4, 5 and 6 in [1], that for all s > and t G ffi, there exist at € M and 
Qt G W such that 

m{t,s) = Eh,P2[ I -EhoP2[- 
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Also, we can show that m(t, s) is a strictly increasing function of t for all 
s > 0. It follows that hi{t) is also strictly increasing. 

We first show that Bmm{£) < ^2; where t2 is such that h2{t2) = e / {1 — e) . 
Let & W he such that t = ||^|| > t2- We shall prove that 

^hP2 (- — TTT^ 1 > ^hP2 ( !^TT\ ^ foi' each a G M and 6 Ve. 

V / \s{H)J 

(A.30) 

Let F = (1 - + eH. We have that 



m[t, s{H)] > m[t2, s{H)] > inf _m(t2, s) = h2{t 



2) 



s<s<s 1 — e 

Therefore, for each a G M and H £ Ve, 

/y — a — x.'6\ ( y \ ^ 

s{H) ) > 1^' 



(1 - e)EH^p2 [ ^—^^) > (1 - e)EH^p2 [-^]+e. 



that is. 



It follows that for every a G M and H ^V^, 

^ f y — a — x'O \ , , ^ f y — a — x'O 

^HP2 777^ > (1 - e)^HoP2 ' 



s{H) J ' "°^"V s{H) 

> (1 - e)EH„p2 f ^) + e > EhP2 



that is, inequality (A.30) holds. The last inequality above follows from A3(ii). 

Next, we show that Bmm{£) > ii, where ti is such that /ii(ti) = e/{l — e). 
Since Bs{e) <ti, we can select an arbitrary t > such that Bs{e) <t <ti. 
It is enough to show that Bmm{s) > t. We know that there exist G M and 
6t G W such that 

hi{t) = m{t,s) = Eh,P2{- — -EhoP2{z: 



Since hi is strictly increasing, hi{t) < /ii(ti) =e/{l — e). It follows that 

(A.31) (1 - e)EH^p2 (^^^^^) < (1 - £)Eh„P2 (f ) + e. 

Define the following sequence of contaminating distributions: Hn = <5(y„,xn)i 
where x„ = nOt and yn = 0(t + ^n^t = at + nt"^- Let Hn = (1 — £)Ho + eHn 
and On = T{Hn)- Suppose that sup„ \\6n\\ < t, in order to produce a con- 
tradiction. Under this assumption, there exists a convergent subsequence, 
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denoted also by {On}, such that hm„_>oo On = 0, where ||0|| = t < t. Assume 
for a moment that the sequence of intercept functionals evaluated at Hn, 
an = t{Hn), satisfies lim^^oo |cKn| = oo. Then 

y - an-yi'On\ , , ,. fVn-ar, 

: nm /" 

n— >oo 



Hm ^H^P2 



s{Hn 



= (1 - e) + e lim p2 
> (1 - e) hm Eh,P2 
= lim Eh„P2 



x' e 



s{Hn) 

y-at- y^Ot 
s{Hn) 

y-at-y^6t 



s{Hn) 

but this fact contradicts the definition of Note that < s < s{Hn) < 

s < oo implies lim.„_»oo Ei^„p2[(y — o^ — x'0t)/s{Hn)] < 1 which, in turn, im- 
plies the strict inequality above. Therefore, we can assume, without loss of 
generality, that lim„^oo cen = a for some finite a S M. As a consequence, we 
have 



(A.32) 



lim 

n— >oo 



Vn Oin ^n^n 



s{Hn) 

yn-at- XnOt 



s{Hn) 







oo and 
for each n. 



We now prove that lim„_^oo s(-f^n) = s for any convergent subsequence 
s{Hn)- Let Soo = lim„_>oo s{Hn)- Note that s satisfies the equation 



(A.33) 



il-e)EH,piiy/s)+e = b. 



Let (7„,,/3„) = {ti{Hn),Ti{Hn)) be the regression 5-estimate based on pi. 
We know that J|/3„|| < Bs{£) < t for all n, so that, without loss of generality, 
lim„^oo/9n = /3) where ||/3|| < t. Assume that lim„^oo |7n| = oo. Since 



(A.34) 



y-7n 



S Hn 



letting n — > oo, it follows that 
b= hm Eh^pi' 



>{!-£) lim EhoPi 



s{Hn) 

y-at- x'Ot 



s{Hn) 



(1 — e) + e lim pi 
■ ^H„Pl 



yn - In 



s{Hn) 

y-at- x'Ot " 
s{Hn) 



Then there exists Sn < s{Hn) such that 
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but this fact contradicts the definition of (7„,/3„). Therefore, we can also 
assume, without loss of generality, that lim^^oo 7n = 7 for some finite 7 G M. 
As a consequence, letting n— > 00 in (A. 34), we obtain 

6 = (1 - e)^HoPi (y^l^J^E] +£>(!- e)Ejy„pi(y/soo) + e. 



Comparing the last equation with (A. 33), we deduce that Soo > •s- Since 
s = swpu^y s{H), we have Sqo = s. We use this fact to obtain Equations 
(A.35) and'(A.36) below. 

Equations (A. 31) and (A. 32) imply that 

/ 2/ — a„, — x'0„ \ /y — a — x'd\ 
limEH„P2 TET^ = (1 - ^)Eh,P2 = + e 



n— >oo 



(A.35) >{l-e)EH^p2(^^j+e 

> (1 - £)Eh,P2 
On the other hand, applying (A. 32), we have 

(A.36) I^^Eh^P2[ ^^^^^ )={l-e)EH^P2[ = 

Therefore, for sufficiency large n, 

This last inequality contradicts the definition of {an, On)- For every t>0 
such that Bsie) < t < ti, we have found a sequence of distributions {i?n} in 
the neighborhood such that sup„ ||T(i?„)|| > t. Therefore, Bmm{£) > ^i- 
□ 

Proof of Theorem 3.2. It is enough to check that the functional J{F) 
defined in (3.15) satisfies condition Al in [1]. For instance, the monotonicity 
condition A 1(a) follows immediately from the monotonicity of the M-scale 
a{F). □ 

Proof of Theorem 4.1. We wiU apply Theorem 3.2. Let Fq = (1 - 
£)FHofi,Q + E'^oo- Then (t{Fo) = ab^e and 



(A.37) 



rcmie) = Jcm{Fo) = inf Ac,e{s) + ce. 



24 J. R. BERRENDERO, B. V. M. MENDES AND D. TYLER 

On the other hand, if \\6\\ = t and Ft = {l — £)Fh„.o,0 + £^0, then, when 
Ho is multivariate normal, we have a{Ft) = {1 +t'^y/'^jb,e and 

(A.38) mcM{t) = JcM{Ft) = llog{l+t^)+ inf ^e,.(s). 

From (3.18), we know that Bcm{£) =te, where mcMite) = rcM{£)- Match- 
ing the expressions in Equations (A. 37) and (A.38) and solving for t yields 
the result. □ 



Proof of Theorem 4.2. Let t G M be arbitrary. Under the assump- 
tions, the function m{t,s) is continuously differentiable with respect to s, 
with derivative given by 



dm{t,s) _ 1 
ds s 



(t>2{s) - (/>2 



(l + t2)l/2 



Since </>2(s) is unimodal, for each e we have that m[t,s) is (a) strictly 
increasing for s S (b) strictly decreasing for s G [s,s] or (c) it has a 

unique critical point s E (s, s) which is a local maximum. In any of the three 
cases, the global minimum of m(t, s) for s G [s,s] is attained at one of the 
two extremes of the interval. That is, 

^2(0 = inf _m(t, s) = min{m(t, s), m(t, s)}. 

s<s<.s 

From Theorem 3.1, an upper bound for the maximum bias is given by the 
value of such that h2{te) = e/{l — e). If /i2(ie) = n^{te,s), then hi(te) = 
/i2(te) and therefore = i{e). On the other hand, if /i2(ie) = nT-itejS), then 
we have that tg = u{e). Hence, the result follows. □ 

Proof of (4.23) . We apply Theorem 1 from [1] . Following the notation 
in that paper, we have that c = cr;,^^. On the other hand, 

m{t)= inf inf Js[(l -e)FH„,Q,6> + e5o] 

||6>||=taGlR 

= inf Js[{l-e)FH„,o,e + e6o]=mf S{e), 

\\e\\=t \\0\\=t 

where S{0) is such that 

(A.39) (i_e)EHop(^;^)=6. 

Since y — x'6 is distributed as {1 + J2i\(^i\)^^ where Z is standard Cauchy, 
we have that (A.39) amounts to 
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Therefore, 



S{0) 

and 



(A.40) m{t)= inf 

=t 



76,e = (1 + 071 



Finally, since i?s(e) = i, where m{t) = (Tb,e^ the result follows from (A.40). 
□ 

Proof of (4.24). Clearly, under the Cauchy model, the expression for 
^CM(e) is formally the same as that corresponding to the Gaussian model. 
We just have to compute g{s) with respect to the Cauchy distribution instead 
of the normal. On the other hand, under the Cauchy model it is not difficult 
to check that 

mcM{t)=\og{l + t)+ inf Ac,e{s), 

where Ac^fr{s) is defined by (4.20). Since the bias satisfies mlBcMi^)] = 
rcM(e)i the result follows. □ 

Proof of (4.25). The same arguments as above yield the following 
expression for the function m(t, s) under the Cauchy model: 



"i(i, s) = 52 (^Y^) ~^2(s). 



From this expression, the computation of ^(e) and u(e) under the Cauchy 
model is straightforward, as follows: 



' Vl-^/ 92\92{<yb,e) + el{l-e)] 
and 

uie) = ^ 1. 

Since we are assuming that (f){s) is unimodal, the same proof as in the case 
of the Gaussian model yields (4.25). □ 

Proof of Theorem 6.1. (i) Computing the derivative of ^c,£(s) with 
respect to s, we see that j4c,e(s) is nondecreasing when c < [(1 — e)</)(s)]"^. 
Since < [{1 — e)(j){s)]~^ for all e and s > 0, the condition c < implies 
that Ac^e{s) is nondecreasing for all e and s > 0. As a consequence, hc{e, a) = 
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for all e and a > 0. Then dc{£) = for all e, which implies that Bcni^) = 
Bsie) for all e. 

(ii) Since g{(Ti,,^) = {b — e)/{l — e) and g{'yb.e) = V(l ~ it follows that 
(A.41) Ac,eiab,e) < Ae,e{lb,e) ^ c> c(e). 

However, if ^c,e(cfe,e) < ^c,e(7fe,e)) then dc{e) > and hence Bs{s) < Bcni^)- 

□ 

Proof of (6.28). By using implicit differentiation, one obtains 

d(Tb,e _ 1 (1 - b)ab,e , d-fb,e _ 1 -blb,e 

de '''' de " (1 - e)2 <^(^, J " 

This then gives 

decje) _ 1 / 1 - 6 6 \ 1-6 6 

The last inequality follows since, as noted previously, ^b,e < Cfe,o < ^M- This 
then implies (6.28). □ 

Proof of Theorem 6.2. For c < 1/K, it has already been noted that 
the maximum bias functions are the same, so we need only consider 1/K < 
c < Co- In general, for c> 1/K and under Assumption A4, the function 
Ac,e(s) has the following properties: 

(i) Ac^siO) = -oo and ^£,^(00) = 00; 

(ii) Ac^ir{s) has two critical points, say (Tl(c, e) < au{c,e), with 
^c,e(s)fr over to iTL(c,e), 

Ac,e{s)il. over (Tl(c, e) to au{c,e) and 
Ac,e(s)f|- over au{c,e) to 00; 

(iii) Ac,e(s) is concave for s < (Tj\/ and convex for s > gm- 

Note that the critical points of Ac^^{s) correspond to the two solutions 
to (t){s) = !/[(! — e)c]. The value of um, though, does not depend on c or e. 
Graphs of a typical function Ac^^[a) for two different values of e are given 
in Figure 5. 

Some further properties, easily verified, are the following: 

(a) 7b^e, (Tfj^e, c7l(c, e), au{c,e) and Ac^e{s) are continuous in e; 

(b) As efr: 7b,eil, db^elt, o"L(c,e)fr, <T(/(c,e)J| and Ac,e(s)JJ.; 

(c) 7;i,£ < with 76^0 = o-b,o; 

(d) if 7 < cr, then ^£,£(7) — ^c,£(<7) is decreasing in e. 
Now, for l/K < c< Co, 

( A.43) if ab,e <(7u{c,e), then Bcm {e) <Bs{e), 
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Fig. 5. Graph of Ac,e{cr). 

since, in this case, dc(e) < 0. So, to prove Theorem 6.2, it only needs to be 
shown that 

(A.44) if ab^e > (^u{c, e), then ^c,e(7b,e) < ^c,e('7;7(c, e)), 

since this imphes dc{£) = and hence Bcm{£) = Bs{s). 

To show (A.44), first note that Uhfl < (Tm since g{abfi) = b> g{crM)- Thus, 
since (Jb^elt ^ind au{c,e)i^ as e increases and both are continuous, there exists 
an Eb such that ab,eb ~ ^u{c,£b)- For any e < Sb, it then fohows that Ub^e < 
'^b,ei, = '^u{C',£b)) so to show (A.44), it is only necessary to consider e > Sb- 

For e> Sb^ we have 

^c,£6(76,e) < ^c,£6(7b,£j < ^c,£6(f^b,£j = ^c,£6(c^C/(c,eb)) < ^c,ei,(o-i7(c,e)). 

The first inequality follows since ^b,£i, ^ o'l(c, £&), the second inequality 
follows from (A. 41) and the third inequality follows from (b) since cru{c, Eb) > 
auic,E) > gm- Statement (A.44) then follows from (d) above. □ 

Proof of Remark 6.1. The remark has already been established for 
c > c(e) and for c < 1/i^. If c > l/-fC and ^(ctm) > then 76^0 = <76,o > ^A/- 
Now, if o"fe,o ^ oT/(c, 0), then since (Jb.e'{\ and (T[/(c, 0)JJ- as e increases, it follows 
that Gb^e ^ fc/(c, e) for all e. This then implies that dc(e) > and hence that 
BcM{e)>Bs{E). 

On the other hand, if gm < crbfi < au{c,E), then by continuity, for suf- 
ficiently small e, we have cjm < ^b,e < (^b,e < c"r/(c, e)- This implies that 
Ac,e{lb,e) < ^c,e{^b,e), SO by (A.41), c > c(ej. □ 
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Proof of Theorem 6.3. Note that under the conditions of Theo- 
rem 6.2, Bsie) > Bcm{£) if and only if 

(A.45) O-fo^e < crc/(c,e) and ^c,e(crfe,e) > ^c,e(<7(7(c, e)). 

So, to prove that an S-functional is inadmissible, one only needs to estab- 
lish (A.45) for some e. First, we will show that the condition Cq = \\m^_^Q+ c(e) 
implies that there exists some e such that (A.45) holds. Then we will show 
that (6.29) is enough to guarantee that Cq = lim£^o+ c(e). Note that by using 
PHopital's rule, one obtains 

(A.46) c(0)= hm c(e) = — 

Also, note that 

(A.47) c>ci = ^°g^"Y"f ^ Ac,o(cx,o)>^c,o(-A./). 

Since Ubfi < (Jm, this implies that ci>l/K since, otherwise, ^c,o(s) would 
be monotone in s. Now, for any c > ci, we then have at ft < (Jm < cru{c, 0) and 
Acfl{(7bfl) > ^c,o(<^m) > Acfl{au{c,0)). By continuity, statement (A.45) then 
follows for sufficiently small e. Now, we show that ci < c(0). To show this, 
note that when c = c(0), we have a^fi = aii^c, 0) and so ^c,o(cfe,o) > ^c,o(o'a/)- 
The first part of the proof then follows from (A.47). 

Note that the lower bound ci can be tightened by working with (A.45) 
directly. In general, it is difficult to use (A.45) to obtain a closed form 
expression, but it can be used for specific examples. 

From (A.46), in the second part of the proof, we need to show that (6.29) 
implies 

(A.48) e c{e)>e/^{ab,o)- 

Since equality holds in (A.48) when e = 0, to show (A.48), it is sufficient to 
prove that the derivative of the left-hand side is never less than the derivative 
of the right-hand side, that is [see Equations (A. 42) and (A.46)], 

^ If 1-^ b ] 1 

(A.49) ______ + __ > 



(1 -e)H</'(o-fe,e) </'(7fe,£)J (Pi^bfl)' 

Recall that we are assuming g{aM) <b = g{(Jbfi) or, equivalently, that a^Q < 
um- This implies 4>{'jb,e) < '/'(^fe.o) and after some simple algebraic manipu- 
lations, we note that (A.49) holds if 

(A.50) ab,e4>{(^b,e) <H(^bfl), 

where ab,e = [(1 - - b]/{l - b). 

Since ab^e is increasing in e, it follows that (i){(Tb,e) is decreasing in e 
whenever ab,e ^ ctm and hence that if (A.50) holds for a^^e = o'A/) then it 
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also holds for cjh^e > gm- Thus, it is sufficient to show that (A. 50) holds for 
Cfe,e < (Jm or, equivalently, for 

e<eM = -. -, r- 

1 -9(c^A/) 

Given that g{s) is convex, we have —g'io},^^) < —g'{(Tbfi)-, so (A. 50) holds if 
0'b,e '^b,e ^ (^b,o- Since g{s) is also nonincreasing, this is equivalent to 

(A. 51) g{ab^e crb,e) > b- 

Thus, the theorem is proved if (A. 51) holds for e < sk- By the convexity of 
g{s), for e <eK, we have 

9{ab,e(^b,e) > g{(^b,e) + {ab,e " '^)<yb,e9' {(^b,e) 

The last term is > 6 if and only if 

(1-6)2 



(A.52) 



- (l-e)(2-e)- 



Note that if (A.52) holds for e = em, then it holds for all e < em- With 
e = Em, though, (A.52) corresponds to the bound (6.29). This completes the 
proof. □ 
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